NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Aioli: A Unified Optimization Framework for Language Model Data Mixing

Chen, Mayee F; Hu, Michael Y; Lourie, Nicholas; Cho, Kyunghyun; Ré, Christopher (April 2024, 2025 International Conference on Learning Representations)

Language model performance depends on identifying the optimal mixture of data groups to train on (e.g., law, code, math). Prior work has proposed a diverse set of methods to efficiently learn mixture proportions, ranging from fitting regression models over training runs to dynamically updating proportions throughout training. Surprisingly, we find that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity. To understand this inconsistency, we unify existing methods into a standard framework, showing they are equivalent to solving a common optimization problem: minimize average loss subject to a method-specific mixing law -- an implicit assumption on the relationship between loss and mixture proportions. This framework suggests that measuring the fidelity of a method's mixing law can offer insights into its performance. Empirically, we find that existing methods set their mixing law parameters inaccurately, resulting in the inconsistent mixing performance we observe. Using this insight, we derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions. Aioli outperforms stratified sampling on 6 out of 6 datasets by an average of 0.27 test perplexity points, whereas existing methods fail to consistently beat stratified sampling, doing up to 6.9 points worse. Moreover, in a practical setting where proportions are learned on shorter runs due to computational constraints, Aioli can dynamically adjust these proportions over the full training run, consistently improving performance over existing methods by up to 12.012 test perplexity points.
more » « less
Full Text Available
Context-Aware Meta-Learning

Fifty, Christopher; Duan, Dennis; Junkins, Ronald; Amid, Ehsan; Leskovec, Jure; Ré, Christopher; Thrun, Sebastian (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
Zoology: Measuring and Improving Recall in Efficient Language Models

Arora, Simran; Eyuboglu, Sabri; Timalsina, Aman; Johnson, Isys; Poli, Michael; Zou, James; Rudra, Atri; Ré, Christopher (May 2024, Proceedings of 12th International Conference on Learning Representations (ICLR))

Full Text Available
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Fu, Daniel Y.; Arora, Simran; Grogan, Jessica; Johnson, Isys; Eyuboglu, Sabri; Thomas, Armin W.; Spector, Benjamin; Poli, Michael; Rudra, Atri; Ré, Christopher (December 2023, Proceedings of the 36th Neural Information Processing Systems Conference (NeurIPS))

Full Text Available
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

Gu, Albert; Johnson, Isys; Timalsina, Aman; Rudra, Atri; Ré, Christopher (May 2023, Proceedings of the 11th International Conference on Learning Representations (ICLR))

Full Text Available
Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Dao, Tri; Fu, Daniel Y.; Saab, Khaled K.; Thomas, Armin W.; Rudra, Atri; Ré, Christopher (May 2023, Proceedings of the 11th International Conference on Learning Representations (ICLR))

Full Text Available
Simple Hardware-Efficient Long Convolutions for Sequence Modeling

Fu, Daniel Y.; Epstein, Elliot L.; Nguyen, Eric; Thomas, Armin W.; Zhang, Michael; Dao, Tri; Rudra, Atri; Ré, Christopher (July 2023, Proceedings of the 40th International Conference on Machine Learning (ICML))

Full Text Available
Interpreting mental state decoding with deep learning models

https://doi.org/10.1016/j.tics.2022.07.003

Thomas, Armin W.; Ré, Christopher; Poldrack, Russell A. (November 2022, Trends in Cognitive Sciences)

Full Text Available
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Dao, Tri; Fu, Daniel Y.; Ermon, Stefano; Rudra, Atri; Ré, Christopher (December 2022, Proceedings of the 35th Neural Information Processing Systems Conference (NeurIPS))

Full Text Available
Ember: no-code context enrichment via similarity-based keyless joins

https://doi.org/10.14778/3494124.3494149

Suri, Sahaana; Ilyas, Ihab F.; Ré, Christopher; Rekatsinas, Theodoros (November 2021, Proceedings of the VLDB Endowment)

Structured data, or data that adheres to a pre-defined schema, can suffer from fragmented context: information describing a single entity can be scattered across multiple datasets or tables tailored for specific business needs, with no explicit linking keys. Context enrichment, or rebuilding fragmented context, using keyless joins is an implicit or explicit step in machine learning (ML) pipelines over structured data sources. This process is tedious, domain-specific, and lacks support in now-prevalent no-code ML systems that let users create ML pipelines using just input data and high-level configuration files. In response, we propose Ember, a system that abstracts and automates keyless joins to generalize context enrichment. Our key insight is that Ember can enable a general keyless join operator by constructing an index populated with task-specific embeddings. Ember learns these embeddings by leveraging Transformer-based representation learning techniques. We describe our architectural principles and operators when developing Ember, and empirically demonstrate that Ember allows users to develop no-code context enrichment pipelines for five domains, including search, recommendation and question answering, and can exceed alternatives by up to 39% recall, with as little as a single line configuration change.
more » « less
Full Text Available

« Prev Next »

Search for: All records